List of Flash News about benchmark limitations
| Time | Details |
|---|---|
|
2025-12-09 19:47 |
Anthropic highlights SGTM study limits: small models, proxy evaluations, and no defense against in‑context attacks — trading implications
According to @AnthropicAI, the SGTM study was run in a simplified setup using small models with proxy evaluations rather than standard benchmarks, limiting generalizability for production-scale systems, source: https://twitter.com/AnthropicAI/status/1998479616651178259. According to @AnthropicAI, SGTM does not stop in‑context attacks when an adversary supplies the information themselves, underscoring unresolved model misuse risks, source: https://twitter.com/AnthropicAI/status/1998479616651178259. According to @AnthropicAI, the post provides no standard benchmark results or references to financial or crypto assets, and it does not indicate any direct crypto market catalyst in this update, source: https://twitter.com/AnthropicAI/status/1998479616651178259. |